Feature Normalization Using Structured Full Transforms for Robust Speech Recognition

نویسندگان

  • Xiong Xiao
  • Jinyu Li
  • Chng Eng Siong
  • Haizhou Li
چکیده

Classical mean and variance normalization (MVN) uses a diagonal transform and a bias vector to normalize the mean and variance of noisy features to reference values. As MVN uses diagonal transform, it ignores correlation between feature dimensions. Although full transform is able to make use of feature correlation, its large amount of parameters may not be estimated reliably from a short observation, e.g. 1 utterance. We propose a novel structured full transform that has the same amount of free parameters as diagonal transform while being able to capture correlation between feature dimensions. The proposed structured transform can be estimated reliably from one utterance by maximizing the likelihood of the normalized features on a reference Gaussian mixture model. Experimental results on Aurora4 task show that the structured transform produces consistently better speech recognition results than diagonal transform and also outperforms advanced frontend (AFE) feature extractor.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

Multi-eigenspace normalization for robust speech recognition in noisy environments

In this paper, we propose an effective feature normalization scheme based on eigenspace normalization, for achieving robust speech recognition. In general, Mean and Variance Normalization (MVN) is implemented in cepstral domain. However, another MVN approach using eigenspace was recently introduced, in that the eigenspace normalization procedure performs normalization in a single eigenspace. Th...

متن کامل

Linear Transforms in Automatic Speech Recognition: Estimation Procedures and Integration of Diverse Acoustic Data

Linear transforms have been used extensively for both training and adaptation of Hidden Markov Model (HMM) based automatic speech recognition (ASR) systems. Two important applications of linear transforms in acoustic modeling are the decorrelation of the feature vector and the constrained adaptation of the acoustic models to the speaker, the channel, and the task. Our focus in the first part of...

متن کامل

Noise robust speaker verification with delta cepstrum normalization

This paper introduces a delta cepstrum normalization (DCN) technique for speaker verification under noisy conditions. Cepstral feature normalization techniques are widely used to mitigate spectral variations caused by various types of noise; however, little attention has been paid to normalizing delta features. A DCN technique that normalizes not only base features but also delta-features was r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011